Time-series & cryptocurrencies

WARNING: this notebook relies on several R packages. Whenever one is called (for instance via a call such as “library(torch)”), this implies that the package has been installed, e.g., via “install.packages(”torch”)”. Hence, make sure the following packages are installed!

# install.packages("tidyverse") 
# install.packages(c("forecast", "quantmod", "crypto2", "RcppRoll", "rugarch", "fGarch"))
# install.package(c("fable", "tsibble", "feasts", "keras", "highcharter", "plotly"))
library(tidyverse)

Crash course on time-series

Definitions & notations

Time-series are random variables indexed by time. Usually, they are written \(x_t\), or \(X_t\) or \(y_t\) or similar notations of this kind. The index \(t\) refers to time. The essential question in time-series is to determine the law of motion of the process, as well as its characteristics.

Most often, the process integrates some form of randomness, which is usually written \(e_t\), \(\epsilon_t\) or \(\varepsilon_t\). There are two common simplifying assumptions that are made with respect to these quantities, which are sometimes referred to as innovations or residuals:

  1. they can be normally distribution (with Gaussian law);
  2. they are independent from each-other.

NOTE: in almost all models, innovations have zero means, \(\mathbb{E}[e_t]=0\), hence in the case of Gaussian innovations, only the variance \(\sigma^2\) is specified. We will work under these (standard) assumptions below. Let’s generate a series of \(e_t\) and plot it below.

N <- 120
sigma <- 0.2
x <- rnorm(N, mean = 0, sd = sigma)
data.frame(t = 1:N, x = x) |>
  ggplot(aes(x = t, y = x)) + geom_line() + theme_minimal()

mean(x)
[1] 0.03684662
sd(x)
[1] 0.2058314

Examples

The random walk

The simplest case is the white noise: \[x_t= e_t,\]

but this is not very interesting. A second important case is linked to the notion of random walk: \[x_t=x_{t-1}+e_t,\] and, iterating, we see that we have \(x_t=\sum_{s=0}^t e_s\) - where we assume a starting point at \(t=0\). Notably, we have \[\mathbb{E}[x_t]=\mathbb{E}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{E}\left[ e_s\right]=0.\]

Moreover, if the innovations are independent,

\[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{V}\left[ e_s\right]=t\sigma^2.\]

Hence, the variance increases with time - linearly. If perturbations are not independent, then it’s a bit more complicated and the variance will depend on all auto-covariances.

The covariance between two points in time is given by (number of overlapping times):

\[\mathbb{C}ov[x_t,x_{t+h}] = \mathbb{E}\left[\sum_{s=0}^t\sum_{r=0}^{t+h} e_se_r\right]=t\sigma^2, \quad h \ge 0.\] This is because all innovation terms are independent, hence the only ones that are non-null are the \(t\) terms of the form \(\mathbb{C}ov[x_s,x_s]=\sigma\).

Hence, we see that both the variance and auto-covariance depend on \(t\), say, the present time. This makes analysis complicated, because, depending on the moment we consider, the distribution of \(x_t\) will change.

Let’s simulate one trajectory.

#set.seed(42) # This freezes the random number generator
x <- cumsum(rnorm(N, mean = 0, sd = sigma))
tibble(t = 1:N, x = x) |>
  ggplot(aes(x = t, y = x)) + geom_line() + theme_minimal()

The auto-regressive model

Let us introduce a small modification to the above process and consider: \[x_t=a+\rho x_{t-1}+e_t, \quad a \in \mathbb{R}, \quad |\rho|< 1.\] This process is called the Auto-Regressive model of order one and is written AR(1). Iterating, we have: \[x_t = a + \rho (a + \rho x_{t-2}+e_{t-1})+e_{t}\] and, until infinity… \[x_t=a \sum_{k=0}^\infty\rho^k+ \sum_{k=0}^\infty \rho^ke_{t-k}=\frac{a}{1-\rho}+ \sum_{k=0}^\infty \rho^ke_{t-k},\] which explains why we need \(|\rho|<1\).

Again, we have

\[\mathbb{E}[x_t]=\frac{a}{1-\rho}, \] and when innovations are independent, \[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{k=0}^\infty \rho^ke_{t-k} \right]=\sum_{k=0}^\infty \mathbb{V}\left[\rho^ke_{t-k} \right]=\frac{\sigma^2}{1-\rho^2}.\]

In this case, we see that the mean and variance do not depend on \(t\)! (obvious given the infinite series)

Moreover,

\[\mathbb{C}ov[x_t,x_{t+h}]=\mathbb{C}ov\left[\sum_{k=0}^\infty \rho^ke_{t-k} ,\sum_{j=0}^\infty \rho^je_{t-j+h} \right]=\sum_{k=0}^\infty \sum_{j=0}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^je_{t-j+h} \right]\] Again, we need to focus on the cases when the time indices are equal, so \[\mathbb{C}ov[x_t,x_{t+h}]=\sum_{k=0}^\infty \sum_{j=h}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^{j-h}e_{t-j} \right]=\sum_{k=0}^\infty \sum_{j=h}^\infty \rho^{k+j-h}\mathbb{C}ov[e_{t-k},e_{t-j} ]\]

Thus, for \(j=k\), we get \[\mathbb{C}ov[x_t,x_{t+h}]=\sigma \sum_{j=h}^\infty \rho^{2j-h}=\sigma^2 \rho^h \sum_{j=0}^\infty \rho^{2j}=\sigma^2\frac{\rho^h}{1-\rho^2}.\] Again, this does not depend on the time index \(t\). If innovations are Gaussian, as Gaussian distributions are entirely defined by their first two moments, this means that the full (unconditional) distribution of the process is invariant. This very convenient property is called stationarity, and comes in various flavors (have a look on the net!). It is very suitable for analysis because now we know that the object that we study (the underlying distribution) is stable in time.

The above quantity, which measures the link between different realizations of the same variable (at different points in time) is very important. Often, the covariance is scaled by the variance to yield the auto-correlation of the process. Relatedly, there is the auto-correlogram, which is defined, for stationary series, as

\[\gamma(h)=\mathbb{C}orr[x_t,x_{t+h}].\] For the AR(1), \(\gamma(h)=\rho^h\).

More generally, autoregressive models of higher orders (AR(p)) have memory that goes back further in time: \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k}+e_t, \quad a \in \mathbb{R},\]

Even more generally, there are ARMA(p,q) processes, defined as \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k} +\sum_{j=1}^q\delta_je_{t-j}+e_t, \quad a \in \mathbb{R},\]

but more in these is left to your curiosity.

Let’s simulate two such processes. We refer to the ARIMA page for details on the implementation. The order is \((p,d,q)\), like in the ARMA(p,q) process (the \(d\) is for integration, a notion out of the scope of this course).

x_05 <- arima.sim(list(order = c(1,0,0), ar=.05), n = N)
x_95 <- arima.sim(list(order = c(1,0,0), ar=.95), n = N)
tibble(time = 1:N, x_05 = x_05, x_95 = x_95) |>
  pivot_longer(-time, names_to = "process", values_to = "values") |>
  ggplot(aes(x = time, y = values, color = process)) + geom_line() +
  theme_bw() +
  scale_color_manual(values = c("#DD9911", "#0066AA"))

One of the processes seems to be just white noise, the other evolves more slowly and oscillates much less.

Let’s have a look at the auto-correlogram of the series. In R, ACF means AutoCorrelation Function. There is a function that shows it in the form of a gg-plot.

library(forecast) # For the ggAcf() function
ggAcf(x_95) + theme_bw() + 
  geom_function(fun = function(x) exp(x*log(0.95)), color = "#11DD33")

The values we extract from the sample are not exactly those predicted from the model. This comes from size of the sample, which is too small. As it increases, we would obtain a perfect fit (you can try!).

ARCH models

In practice simple autoregressive models can be too limited because some variables are not exactly stationary. To illustrate this statement, let’s have a look at a volatility index, the VIX.

NOTE: usually, the volatility is the standard deviation of past returns. The VIX is slightly different, as it is based on forward-looking option prices. Still, it is linked (empirically) to the traditional volatility.

library(quantmod)
library(highcharter)
min_date <- "1998-01-01"                 # Starting date
max_date <- "2024-11-27"                 # Ending date
getSymbols("^VIX", src = 'yahoo',        # The data comes from Yahoo Finance
           from = min_date,              # Start date
           to = max_date,                # End date
           auto.assign = TRUE, 
           warnings = FALSE)
[1] "VIX"
VIX <- VIX |> as.data.frame() |>
  rownames_to_column("date") |>
  mutate(date = as.Date(date))
head(VIX,7)                                   # Have a look at the result!
date VIX.Open VIX.High VIX.Low VIX.Close VIX.Volume VIX.Adjusted
1998-01-02 24.34 24.93 23.42 23.42 0 23.42
1998-01-05 24.11 25.02 23.02 24.36 0 24.36
1998-01-06 25.20 25.97 24.87 25.66 0 25.66
1998-01-07 26.15 27.43 25.07 25.07 0 25.07
1998-01-08 26.24 26.70 25.62 26.01 0 26.01
1998-01-09 25.79 29.35 25.69 28.69 0 28.69
1998-01-12 28.69 31.08 28.02 28.02 0 28.02
vix_chart <- VIX |> select(VIX.Close)
rownames(vix_chart) <- VIX |> pull(date)
highchart(type = "stock") %>%
    hc_title(text = "Evolution of the VIX") %>%
    hc_add_series(as.xts(vix_chart)) %>%
    hc_tooltip(pointFormat = '{point.y:.3f}') 

Hence, we see that the series exhibits some moments of extreme activity and its distribution is not the same through time. This is referred to as “clusters of volatility”.

Therefore, we also need to introduce models that allow for this type of feature. In 1982, Robert Engle proposed such a model, called ARCH, which was generalized in 1986 to GARCH models. As we show below, there is a direct link with auto-regressive processes!

The idea is to work with the innovations, \(e_t\) and to specify them a bit differently, i.e., like this: \(e_t=\sigma_t z_t\), where \(\sigma_t\) is going to be a changing standard deviation and \(z_t\) is a simple white noise. What matters is that the vol term is modelled as \[\sigma^2_t = a+\sum_{j=1}^pd_j\sigma^2_{t-j} + \sum_{k=1}^qb_ke^2_{t-k},\] hence, the value of \(\sigma_t^2\) depends both on its past and on the past of squared innovations.

References

For time-series

There many resources available online. We single out two below:

For cryptocurrencies

Application to cryptocurrencies

For simplicity, we will work with daily data. Higher frequencies can be obtained outside the crypto space via the highfrequency package. Or playing with dedicated APIs (e.g. with packages: binance/coinmarketcap).

Fetching data

We will use the crypto2 package below. If need be: install it.

library(crypto2)

First, let’s have a look at the list of available coins… which are numerous!

coins <- crypto_list() 

You can have a look at the info for the coins via the commented code below.

c_info <- crypto_info(coin_list = coins, limit = 30)
❯ Scraping crypto info
❯ Processing crypto info

Next, we can download historical quotes.

coin_names <- c("Bitcoin", "Ethereum", "Tether USDt", "BNB", "USDC", "Solana")
coin_hist <- crypto_history(coins |> filter(name %in% coin_names),
                            start_date = "20170101",
                            end_date = "20241206")
❯ Scraping historical crypto data
❯ Processing historical crypto data

Coin USDC does not have data available! Cont to next coin.

Coin Solana does not have data available! Cont to next coin.

Coin Solana does not have data available! Cont to next coin.
coin_hist <- coin_hist |>  # Timestamps are at midnight, hence we add a day.
  mutate(date = 1 + as.Date(as.POSIXct(timestamp, origin="1970-01-01")))

And plot them. NOTE: mind the log-scale for the y-axis.
Also, we use plotly below for a better experience.

library(plotly)
g <- coin_hist |>
  ggplot(aes(x = date, y = close, color = name, label = symbol)) + 
  geom_line() +
  scale_y_log10() + theme_bw() + xlab("")
ggplotly(g)
coin_hist |>
  filter(symbol %in% c("USDT", "USDC")) |>
  ggplot(aes(x = date, y = close, color = name, label = symbol)) + 
  geom_line() + theme_classic() +
  theme(legend.position = c(0.8,0.8))

Now, the problem with some of these series is that they are NOT stationary. At the beginning of the sample, their values are much lower to those at th end of the sample. This is one of the reasons why in finance, we look at returns: \[r_t=\frac{p_t-p_{t-1}}{p_{t-1}},\] which are simply relative price changes.
Let’s compute them below. We recall that prices are gathered in the close column.

coin_hist <- coin_hist |> 
  group_by(name) |>
  mutate(return = close / lag(close) - 1 ) |>
  ungroup()

And have a look at their distributions.

coin_hist |>
  ggplot(aes(x = return, fill = name)) + geom_histogram() +
  facet_wrap(vars(name), scales = "free") + theme_bw() +
  theme(axis.title = element_blank(),
        legend.position = c(0.85,0.2))

There seem to be outliers (to the left or right of the most common values)!

Estimations

Ok, so now let’s try to link what we have seen in the theoretical part with the data. This step is called estimation. Basically, if we think that the data follows \(x_t=bx_{t-1}+e_t\), we need to find \(b\)!

Returns

First, let’s have a look a autocorrelograms (of returns).

crypto_acf <-
  coin_hist |> 
  na.omit() %>%
  split(.$name) %>% 
  map(~acf(.$return, plot = F)) %>% 
  map_dfr(
    ~data.frame(lag = .$lag, 
                acf = .$acf,
                ci = qnorm(0.975)/sqrt(.$n.used)
    ), .id = "name") 
crypto_acf |> 
  filter(lag > 0, lag < 6) |>
  ggplot(aes(x = lag, y = acf, fill = name)) + 
  geom_col(position = "dodge") + theme_bw()

But recall that some of the coins are stable!
Overall, however, there does not seem to be a strong memory pattern. Hence, daily returns appear to be mostly white noise.

Realized volatility

Then, let’s have a look at volatility. It’s measured as

\[v_t=\sqrt{\frac{1}{T-1}\sum_{s=1}^T(r_{t-s}-\bar{r})^2},\] where \(\bar{r}\) is the mean return in the sample of \(T\) observations. Below, we use an optimized (C++-based function) to compute it via rolling standard deviation over 20 days.

library(RcppRoll)
coin_hist <- coin_hist |>
  group_by(name) |>
  na.omit() |>
  mutate(real_vol_20 = roll_sd(return, n = 20, fill = NA, na.rm = T))
coin_hist |>
  filter(name == "Bitcoin") |>
  pivot_longer(all_of(c("close", "real_vol_20")), names_to = "series", values_to = "value") |>
  ggplot(aes(x = date, y = value, color = series)) + geom_line() + theme_bw() +
  facet_wrap(vars(series), nrow = 2, scales = "free")# + scale_y_log10()
Warning: Removed 19 rows containing missing values or values outside the scale range
(`geom_line()`).

Now let’s have a look a the autocorrelation of the volatility: it’s highly persistent.

crypto_acf <- coin_hist |> 
  na.omit() %>%
  split(.$name) %>% 
  map(~acf(.$real_vol_20, plot = F)) %>% 
  map_dfr(
    ~data.frame(lag = .$lag, 
                acf = .$acf,
                ci = qnorm(0.975)/sqrt(.$n.used)
    ), .id = "name") 
crypto_acf |> 
  filter(lag > 0, lag < 15) |>
  ggplot(aes(x = lag, y = acf, fill = name)) + 
  geom_col(position = "dodge", alpha = 0.8) + theme_bw() +
  scale_fill_viridis_d(option = "magma", end = 0.9, begin = 0.1)

The decay is slow, but there is heterogeneity between volatile coins (BTC, ETH) and stable coins (Tether, USD Coin).

GARCH estimation

There are many packages for GARCH estimation. We propose two: ruGARCH and fGARCH.

library(rugarch)
library(fGarch)

In fGARCH, the model is:

\[s_t = \omega + \alpha e_{t-1} + \beta s_{t-1}.\] In the output, there are some additional parameters that allow to propose a more general model:

  • shape is the shape parameter of the conditional distribution.
  • skew is the skewness parameter of the conditional distribution.
  • delta a numeric value, the exponent delta of the variance recursion.
garchFit(formula = ~garch(1,1), 
         data = coin_hist |> 
           filter(name == "Bitcoin") |> 
           pull(return)) 

Series Initialization:
 ARMA Model:                arma
 Formula Mean:              ~ arma(0, 0)
 GARCH Model:               garch
 Formula Variance:          ~ garch(1, 1)
 ARMA Order:                0 0
 Max ARMA Order:            0
 GARCH Order:               1 1
 Max GARCH Order:           1
 Maximum Order:             1
 Conditional Dist:          norm
 h.start:                   2
 llh.start:                 1
 Length of Series:          2895
 Recursion Init:            mci
 Series Scale:              0.03757189

Parameter Initialization:
 Initial Parameters:          $params
 Limits of Transformations:   $U, $V
 Which Parameters are Fixed?  $includes
 Parameter Matrix:
                     U           V     params includes
    mu     -0.60970732   0.6097073 0.06097073     TRUE
    omega   0.00000100 100.0000000 0.10000000     TRUE
    alpha1  0.00000001   1.0000000 0.10000000     TRUE
    gamma1 -0.99999999   1.0000000 0.10000000    FALSE
    beta1   0.00000001   1.0000000 0.80000000     TRUE
    delta   0.00000000   2.0000000 2.00000000    FALSE
    skew    0.10000000  10.0000000 1.00000000    FALSE
    shape   1.00000000  10.0000000 4.00000000    FALSE
 Index List of Parameters to be Optimized:
    mu  omega alpha1  beta1 
     1      2      3      5 
 Persistence:                  0.9 


--- START OF TRACE ---
Selected Algorithm: nlminb 

R coded nlminb Solver: 

  0:     3910.7388: 0.0609707 0.100000 0.100000 0.800000
  1:     3907.8662: 0.0609715 0.0856792 0.104016 0.796215
  2:     3902.4980: 0.0609725 0.0847558 0.116883 0.804530
  3:     3899.9022: 0.0609736 0.0706761 0.122960 0.805137
  4:     3898.4794: 0.0609751 0.0655524 0.128901 0.818328
  5:     3897.2824: 0.0609764 0.0565617 0.120838 0.827799
  6:     3896.5944: 0.0609780 0.0557467 0.110549 0.839157
  7:     3896.5471: 0.0609803 0.0524758 0.106831 0.843386
  8:     3896.4709: 0.0609825 0.0517019 0.107158 0.846334
  9:     3896.4420: 0.0609880 0.0523011 0.105795 0.846025
 10:     3896.4414: 0.0609880 0.0522266 0.105762 0.846006
 11:     3896.4410: 0.0609880 0.0522104 0.105769 0.846088
 12:     3896.4404: 0.0609889 0.0520995 0.105688 0.846182
 13:     3896.4390: 0.0609918 0.0520253 0.105556 0.846476
 14:     3896.4373: 0.0610037 0.0517828 0.105169 0.846924
 15:     3896.4367: 0.0610401 0.0517934 0.105025 0.847187
 16:     3896.4362: 0.0610799 0.0516689 0.105010 0.847266
 17:     3896.4359: 0.0611184 0.0516850 0.104831 0.847393
 18:     3896.4357: 0.0611576 0.0515571 0.104809 0.847525
 19:     3896.4357: 0.0611576 0.0515814 0.104816 0.847551
 20:     3896.4356: 0.0611576 0.0515678 0.104806 0.847544
 21:     3896.4355: 0.0611579 0.0515335 0.104702 0.847640
 22:     3896.4355: 0.0611754 0.0515199 0.104702 0.847683
 23:     3896.4355: 0.0611930 0.0514983 0.104692 0.847703
 24:     3896.4355: 0.0612107 0.0515011 0.104689 0.847716
 25:     3896.4354: 0.0613168 0.0514841 0.104643 0.847763
 26:     3896.4354: 0.0612807 0.0514254 0.104570 0.847895
 27:     3896.4354: 0.0612688 0.0514349 0.104533 0.847907
 28:     3896.4354: 0.0612809 0.0514312 0.104529 0.847913
 29:     3896.4354: 0.0612840 0.0514304 0.104528 0.847915

Final Estimate of the Negative LLH:
 LLH:  -5603.505    norm LLH:  -1.93558 
          mu        omega       alpha1        beta1 
2.302557e-03 7.260149e-05 1.045278e-01 8.479151e-01 

R-optimhess Difference Approximated Hessian Matrix:
                 mu         omega        alpha1         beta1
mu     -2772905.769 -5.640489e+04     -4362.609     -3194.073
omega    -56404.891 -6.416867e+10 -39904590.060 -63048934.538
alpha1    -4362.609 -3.990459e+07    -45598.987    -53166.599
beta1     -3194.073 -6.304893e+07    -53166.599    -74553.960
attr(,"time")
Time difference of 0.01187015 secs

--- END OF TRACE ---


Time to Estimate Parameters:
 Time difference of 0.1085908 secs

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = pull(filter(coin_hist, 
    name == "Bitcoin"), return)) 

Mean and Variance Equation:
 data ~ garch(1, 1)
<environment: 0x10e8e72e0>
 [data = pull(filter(coin_hist, name == "Bitcoin"), return)]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1       beta1  
2.3026e-03  7.2601e-05  1.0453e-01  8.4792e-01  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     2.303e-03   6.006e-04    3.834 0.000126 ***
omega  7.260e-05   1.153e-05    6.296 3.06e-10 ***
alpha1 1.045e-01   1.370e-02    7.629 2.38e-14 ***
beta1  8.479e-01   1.759e-02   48.195  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 5603.505    normalized:  1.93558 

Description:
 Fri Dec  6 13:59:53 2024 by user:  

Now, with rugarch

spec <- ugarchspec()
ugarchfit(data = coin_hist |> 
            filter(name == "Bitcoin") |> 
            na.omit() |>
            pull(return), 
          spec = spec)@fit$coef
           mu           ar1           ma1         omega        alpha1 
 0.0022738222 -0.5164105580  0.4916640253  0.0000704438  0.1006337567 
        beta1 
 0.8526151099 

The results are not exactly the same! mu and omega are close…
We recall the model below \[s_t = \omega + \alpha e_{t-1} + \beta s_{t-1}.\] ar1 is the autocorrelation coefficient, hence \(\beta\) and ma1 is \(\alpha\).

Forecasting

Of course, when it comes to cryptos, one of the most interesting (and hardest) topic is prediction. To show how tough this is, we will rely on the fable package.
It’s highly integrated and performs many analyses automatically. Below, we use an ARIMA model for the forecast, see the reference for the parameters: https://fable.tidyverts.org/reference/ARIMA.html Beyond ARIMA, the methods available are ETS (exponential smoothing state space), TSLM (time-series linear model, \(y_t=a+bx_t\)), NNETAR (neural network autoregression, \(y_t=f(y_{t-1})\)).

Below, we leave the function pick the optimal orders via the function pdq(). But orders can be specified (see below).

library(fable)
library(tsibble)
library(feasts)
coin_hist |>
  na.omit() |>
  filter(name == "Ethereum") |>
  tsibble(index = date) |>
  model(
    arima = ARIMA(return ~ 1 + pdq()),
    snaive = SNAIVE(return)
  ) |>
  forecast(h = 15) |>
  autoplot() + theme_bw() + facet_grid(vars(.model))

coin_hist |>
  na.omit() |>
  filter(name == "Ethereum") |>
  tsibble(index = date) |>
  model(
    arima = ARIMA(close ~ 1 + pdq()),
    snaive = SNAIVE(close)
  ) |>
  forecast(h = 15) |>
  autoplot() + theme_bw() + facet_grid(vars(.model))

Let’s have a look at the estimated coefficients from models.

coin_hist |>
  na.omit() |>
  filter(symbol == "ETH") |>
  tsibble(index = date) |>
  model(
    arima = ARIMA(return ~ 1 + pdq()),
    snaive = SNAIVE(return)
  ) |>
  coef()
.model term estimate std.error statistic p.value
arima ar1 -0.5367215 0.0157484 -34.0809523 0.0000000
arima sar1 -0.0275681 0.0186520 -1.4780224 0.1395113
arima constant 0.0000109 0.0011181 0.0097921 0.9921878

Note: sar is an auto-regressive component with seasonality.
We can specify a particular set of orders p, d and q.

coin_hist |>
  na.omit() |>
  filter(symbol == "ETH") |>
  tsibble(index = date) |>
  model(
    arima = ARIMA(return ~ 1 + pdq(1,0,1)),
    snaive = SNAIVE(return)
  ) |>
  coef()
.model term estimate std.error statistic p.value
arima ar1 -0.7551390 0.1820543 -4.147878 0.0000345
arima ma1 0.7271869 0.1893577 3.840281 0.0001255
arima constant 0.0056389 0.0015990 3.526490 0.0004277

Prediction with LSTM (RNN)

First, because we are using an M1 Mac, we must resort to a trick to make tensorflow work. See the following post if you are also on a recent Mac.
We also build the time-series from which to learn.

library(reticulate)
use_condaenv("r-tensorflow") # This is for M1 Macs with special conda environment
library(TSLSTM)

data_coin <- coin_hist |> 
  select(timestamp, symbol, close, volume) |>
  filter(symbol %in% c("BTC", "ETH")) |>
  group_by(symbol) |>
  mutate(date = as.Date(timestamp),
         return = close / lag(close) - 1,
         vol_change = volume / lag(volume) - 1) |>
  ungroup() |>
  select(date, symbol, return, vol_change) |>
  pivot_wider(names_from = "symbol", values_from = c("return", "vol_change")) |>
  mutate(return_ETH = lead(return_ETH)) |> # We forward the ETH return, as we want to predict it
  na.omit()

train_coin <- data_coin |> filter(date < "2022-06-30")
test_coin <- data_coin |> filter(date > "2022-06-30")

For technical reasons, we need Tensorflow to run on CPU and not GPU (it crashes otherwise on my mac!)

import tensorflow as tf
# Set the device to CPU
tf.config.set_visible_devices([], 'GPU')

The idea is to predict ETH return with past BTC return, past ETH volume change and past BTC volume change.

library(keras3)
model_rnn <- keras_model_sequential(input_shape =  c(nrow(train_coin), 3)) %>% 
  layer_gru(units = 9,                                     # Nb units in hidden layer
            name = "layer_1",
            kernel_initializer = 'glorot_uniform', 
            bias_initializer = 'glorot_uniform',
            #input_dim = c(1, nrow(train_coin), 3),
            #batch_input_shape = c(1, nrow(train_coin), 3), # Dimensions = tricky part!
            activation = 'tanh',                           # Activation function
            return_sequences = TRUE) %>%                   # Return all the sequence
  layer_dense(units = 1)                                   # Final aggregation layer
model_rnn %>% compile(
  loss = 'mean_squared_error',                             # Loss = quadratic
  optimizer = optimizer_rmsprop(),                         # Backprop
  metrics = c('mean_absolute_error')                       # Output metric MAE
)

Onto training!

fit_rnn <- model_rnn %>% 
  fit(x = train_coin |>                                           # Training features   
        select(return_BTC, vol_change_BTC, vol_change_ETH) |> 
        data.matrix() |>
        array(dim = c(1, nrow(train_coin), 3)),        
      y = train_coin |> pull(return_ETH) |> 
        array(dim  = c(1,nrow(train_coin))),                      # Training labels
      epochs = 15,                                                # Number of rounds
      batch_size = 1 ,                                            # Length of sequences
      verbose = 0)                                                # No comments
plot(fit_rnn) + theme_bw() + geom_point(color = "black") + geom_line()

Now, unfortunately, we cannot use the fitted model as such because the dimensions between the training and testing sets are not the same. We have to create a new model and import the old model’s weights…

new_model <- keras_model_sequential(input_shape =  c(nrow(train_coin), 3)) %>% 
  layer_gru(units = 9, 
            #batch_input_shape = c(1, nrow(test_coin), 3), # New dimensions 
            kernel_initializer='glorot_uniform', 
            bias_initializer='glorot_uniform',
            name = "layer_1",
            activation = 'tanh',                      # Activation function
            return_sequences = TRUE) %>%              # Return the full sequence
  layer_dense(units = 1)                              # Output dimension
new_model %>% keras3::set_weights(keras3::get_weights(model_rnn))

We are now ready.

preds_rnn <- predict(new_model, 
        test_coin |> 
          select(return_BTC, vol_change_BTC, vol_change_ETH) |> 
          data.matrix() |>
          array(dim = c(1, nrow(test_coin), 3)))
1/1 - 0s - 117ms/step
mean(abs(preds_rnn - test_coin |> pull(return_ETH) ))
[1] 0.04490194

Note: the result, because of the randomization of weights, is also random.
To be compared with the training metric…